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Abstract 

Robots are increasingly introduced to work in con¬ 
cert with people in high-intensity domains, such as 
manufacturing, space exploration and hazardous en¬ 
vironments. Although there are numerous studies on 
human teamwork and coordination in these settings, 
very little prior work exists on applying these mod¬ 
els to human-robot interaction. This paper presents re¬ 
sults from ongoing work aimed at translating qualitative 
methods from human factors engineering into computa¬ 
tional models that can be applied to human-robot team¬ 
ing. We describe a statistical approach to learning pat¬ 
terns of strong and weak agreements in human planning 
meetings that achieves up to 94% prediction accuracy. 

We also formulate a human-robot interactive planning 
method that emulates cross-training, a training strategy 
widely used in human teams. Results from human sub¬ 
ject experiments show statistically significant improve¬ 
ments on team fluency metrics, compared to standard 
reinforcement learning techniques. Results from these 
two studies support the approach of modeling and ap¬ 
plying common practices in human teaming to achieve 
more effective and fluent human-robot teaming. 

Introduction 

We envision a future in which collaboration between hu¬ 
mans and robots will be indispensable to our work in many 
domains, ranging from manufacturing to surgery to space 
exploration. The success of these systems will depend in 
large part on the ability of robots to integrate with existing 
human teams. Our goal is to develop robot partners that we 
can work with more easily and naturally, as inspired by the 
way we work with other people. 

Our hypothesis is that the performance of human-robot 
teams is improved when a robot teammate emulates the 
effective teamwork behaviors observed in human teams. 
There is a precedent for human-human interaction (HHI) 
research informing the design for human-robot interaction, 
e.g. (Lockerd and Breazeal 2004; Sakita et al. 2004; Sidner 
et al. 2005; Trafton et al. 2005; Hoffman and Breazeal 2007). 
We draw from a body of human-human interaction (HHI) 
research that has not yet been widely applied to HRI: stud¬ 
ies in human teaming for high-intensity domains, including 
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studies of military tactical teams (Entin, Serfaty, and Deck- 
ert 1994; Entin and Serfaty 1999; Stout et al. 1999), avia¬ 
tion crews (Salas et al. 1999), medical teams (Mackenzie, 
Xiao, and Horst 2004), and disaster response crews (Stent 
2000). Team dynamics have a significant impact on perfor¬ 
mance within these domains, producing a strong incentive 
for teams to understand and apply the communication and 
coordination strategies that improve performance. 

This paper presents recent results from ongoing work 
aimed at translating qualitative methods from human fac¬ 
tors engineering and human team coordination into compu¬ 
tational models that can be applied to human-machine team 
planning. We describe two studies. The first study presents a 
statistical approach to learning patterns of strong and weak 
agreements in human planning meetings. Our approach ap¬ 
plies statistical machine learning to dialog features, which 
prior studies in cognitive psychology have shown qualita¬ 
tively capture the level of commitment to plan choices. Ini¬ 
tial results indicate that we can achieve up to 94% aver¬ 
age accuracy in predicting strong and weak agreement. This 
work is the first step towards designing an intelligent robot 
partner that participates in natural human team planning ses¬ 
sions, encouraging team members to revisit decisions that 
may adversely affect the team’s performance, and spurring 
dialog that results in higher quality plans. 

The second study describes the design and evaluation 
of computational teaming models that support human-robot 
cross-training. Cross-training is a technique widely used 
in human teams, whereby team members iteratively switch 
roles on the team. This method has been empirically val¬ 
idated to improve mental model similarity among human 
team members and to improve team performance measures. 
We formulate human-robot cross-training and evaluate it in 
a user study (n = 36). Results show statistically significant 
results on quantitative human-robot mental model elicitation 
measures and teamwork fluency metrics, compared to stan¬ 
dard reinforcement learning techniques where the human 
assigns rewards to the robot. Additionally, significant dif¬ 
ferences emerge in subjective measures related to perceived 
robot performance and human trust. These results support 
the approach of modeling and applying common practices in 
human teaming to achieve more effective and fluent human- 
robot teaming. 
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Study: Quantitative Prediction of Strength of 
Agreement in Human Team Planning 

Goal-oriented meetings are frequent occurrences in our ev¬ 
eryday lives. For example, we discuss project plans at work 
and coordinate with friends to organize outings. The conse¬ 
quences are inconvenient but relatively minor if the partici¬ 
pants leave these types of discussions with different under¬ 
standings of what was decided upon. Yet, in high-intensity 
domains such as disaster-response, minor differences in un¬ 
derstandings may degrade the team’s ability to successfully 
coordinate and may have serious consequences for people’s 
safety. 

It is challenging for a group to reach consensus through 
dialog. The interaction among people is complicated, dy¬ 
namic and unpredictable. Natural human collaborative di¬ 
alog unfolds in cycles, agreements are fluid, and proposals 
are often implicitly communicated and accepted (Eugenio et 
al. 1999). In addition, there are social aspects that are often 
hard to formulate into equations. These characteristics make 
explicit modeling and quantitative analysis of goal-oriented 
dialog challenging. 

Prior art provides a theoretical foundation for translating 
the ambiguous and inconsistent nature of dialog into a set 
of dialog features that indicate that strength of agreement 
to plan decisions (Eugenio et al. 1999). In our recent work, 
we generalize this qualitative approach to enable a quantita¬ 
tive, predictive capability for characterizing weak and strong 
agreement in dialog. We envision this work a first step to¬ 
wards the design of an intelligent agent or robot that ob¬ 
serves human team planning and interjects to highlight weak 
agreement among team members. This approach would in¬ 
tegrate an intelligence agent into natural human team dy¬ 
namics and does not require extensive codifying of domain 
knowledge. In contrast, prior approaches to decision support 
in planning utilize automated planners to provide sugges¬ 
tions to the team. A common criticism of this approach is 
that automated planners can not practically capture all rel¬ 
evant domain knowledge and expertise and frequently pro¬ 
vide uninformative solutions. 

Our approach to predicting strength of agreement builds 
on prior, qualitative investigations of the human team 
decision-making process ((Eugenio et al. 1999), (Black 
1948), (Hiltz, Johnson, and Turoff 1986)). In particular, Eu¬ 
genio et al. (Eugenio et al. 1999) present an empirical study 
of the use of dialog features for characterizing strength of 
agreement. The dialog features capture the level of joint 
commitment as the negotiation unfolds, from an inability 
to commit due to lack of full information {partner decid¬ 
able option), to conditional commitment {proposal), to un¬ 
conditional commitment. Each of these dialog features are 
composed in part of traditional dialog acts, which represent 
the intention or the role of an utterance in dialogue. Dia¬ 
logue acts are widely in natural language processing sys¬ 
tems for training and testing (Nagata and Morimoto 1994; 
Stolcke et al. 2000; Ji and Bilmes 2005). Eugenio et al. ar¬ 
gue, and we validate quantitatively, that dialogue acts (in 
particular, accept and reject) are not sufficient for categoriz¬ 
ing strength of agreement in dialog. This is because dialog 


acts only consider one agent’s attitude toward an action and 
therefore do not provide information on the level of joint 
commitment. In addition, the traditional dialogue acts are 
not able to capture implicit accepts and rejects in the dia¬ 
logue. 

In our recent work, we utilize both dialogue acts and 
Eugenio dialog features within a machine learning frame¬ 
work to quantitatively predict strength of agreement. A key 
benefit to using dialog acts and features to characterize 
strength of agreement is that this approach does not re¬ 
quire the extraction of keywords or other content informa¬ 
tion from the dialog. In other words, this approach uses 
information about how the team plans, but does not re¬ 
quire storing and processing potentially sensitive informa¬ 
tion about what they are planning, as is the case for previous 
quantitative approaches (Hahn, Ladner, and Ostendorf 2006; 
Hillard and Ostendorf 2003) to this problem. To our knowl¬ 
edge, our approach is the first to (1) estimate strength of 
agreement based solely on dialog acts and features and not 
keywords, and to (2) map the qualitative theoretical founda¬ 
tions for strength of agreement to a quantitative, predictive 
measure. 

We apply a number of statistical machine learning tech¬ 
niques to predict strength of agreement, including SVMs, 
logistic regression, Kmeans, and expectation maximization 
with gaussian mixture models. We show that we can achieve 
acceptable accuracy, up to 94% correct prediction of the 
strength of agreements without using keywords. We also 
show that Eugenio et al. dialog features play a significant 
role in prediction accuracy, as compared to analysis using di¬ 
alog acts alone. We apply SVM feature ranking and Fisher’s 
exact test to analyze the significance of association between 
classification features (which include Eugenio’s features and 
dialog acts) and strength of agreement. The SVM ranking 
test indicates Eugenio’s features. Proposal and Partner De¬ 
cidable Option rank as the top two classification features for 
the weak agreement category, and Eugenio’s feature Unen¬ 
dorsed Option ranks among the top two classification fea¬ 
tures for the strong agreement category. Fisher’s Exact Test, 
a method that is independent of classification technique, in¬ 
dicates that three out of four of Eugenio’s features are ranked 
among the top five classification features. These results lend 
support to the hypothesis that Eugenio’s features play a sig¬ 
nificant role in the prediction task. 

Study: Human Team Training as a Guide for 
Human-Robot Team Training 

We propose a novel framework that uses insight from prior 
art in human team coordination and shared mental mod¬ 
els to increase the performance of human-robot teams col- 
laboratively executing complex tasks. Shared mental mod¬ 
els (SMMs) (Cooke, Salas, and Cannon-bowers 2000) are 
measurable models developed among team members prior 
to task execution and are strongly correlated to team per¬ 
formance. Although numerous studies have modeled the 
performance-linked characteristics of SMMs in human team 
coordination, very little prior work exists on applying these 
models to a human-robot interaction framework. We pro- 
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pose that valuable insights can be drawn from these works. 
For instance, a study evaluating teamwork in flight crews 
(Mathieu et al. 2005) has shown that teams with accurate 
but different mental models among team members perform 
worse than teams having less accurate but common models. 
Applying this insight to human-robot teaming leads to a hy¬ 
pothesis that, to promote effective teamwork, a robot must 
execute a task plan that is similar to the human partner’s 
mental model of the execution. 

In our recent work, we design and evaluate a frame¬ 
work that leverages methods from human factors engineer¬ 
ing to promote the development of teaming models that 
are shared across human and robot team members. Our ap¬ 
proach to human-robot team training uses an cross-training 
phase (Stout et al. 1999; Volpe et al. 1996; Marks et al. 
2002), which preceeds task execution. There are three types 
of cross-training (Blickensderfer E. and E. 1998) a) posi¬ 
tional clarification (b) positional modeling and c) positional 
rotation. Findings (Marks et al. 2002; Volpe et al. 1996; 
Cannon-Bowers J.A. 1998) suggest that positional rotation, 
which is defined as “learning interpositional information by 
switching work roles”, is the most strongly correlated to 
improvement in team performance, as it “provides hands 
on approach to learning interpositional information by giv¬ 
ing members experience on carrying out teammates’ duties 
through active participation in each member’s role” (Marks 
et al. 2002). The goal of positional rotation is to provide the 
individual with hands-on knowledge about the roles and re¬ 
sponsibilities of other teammates, with the purpose of im¬ 
proving interrole knowledge and team performance. 

We emulate positional rotation in human teams by having 
the human and robot iteratively switch roles. We name the 
phase where the roles of the human and robot match the ones 
of the actual task execution as the forward phase, and the 
phase where human and robot roles are switched as rotation 
phase. In order for the robot’s computational teaming model 
to converge to the human mental model: 

1. The robot needs to have an accurate estimate of the hu¬ 
man’s role in performing the task, and this needs to be 
similar to the human’s awareness of his or her own role. 
Based on the above, we use the human-robot forward 
phase of the training process to update our estimation of 
the transition probabilities that encode the expected hu¬ 
man behavior. 

2. The robot’s actions need to match the expectations of the 
human. We accomplish this by using the human inputs in 
the rotation phase to update the reward assignments. 

We computationally encode the human-robot teaming 
model as a Markov Decision Process (MDP) and show that 
our formulation captures knowledge about the role of the 
robot and human team member and is quantitatively compa¬ 
rable to the human mental model. Additionally, we propose 
quantitative measures to assess human-robot mental model 
convergence, as it emerges through a training process, and 
mental model similarity. We then introduce a human-robot 
interactive planning method which uses the MDP computa¬ 
tional teaming model to emulate cross-training (Marks et al. 
2002 ). 


We compare human-robot cross-training to standard re¬ 
inforcement learning algorithms through a large-scale user- 
study of 36 human subjects. Specifically, we compare the 
proposed formulation to the standard interactive reinforce¬ 
ment learning approach, where the reward signal is provided 
by a human teacher or coach (Russell and Norvig 2003). In 
this work, we benchmark against the reinforcement learn¬ 
ing algorithm Sarsa(A) with greedy policy (Sutton and Barto 
1998). We chose Sarsa(A) for its popularity and applicabil¬ 
ity in a wide variety of tasks. In particular, Sarsa(A) has been 
used to benchmark TAMER framework (Knox and Stone 

2009) , as well as to test TAMER-RL (Knox and Stone 2010; 
2012). Furthermore, if we remove eligibility traces (by set¬ 
ting A = 0), our implementation of Sarsa(A) with greedy 
policy is identical to the Q-Learning with Interactive Re¬ 
wards (Thomaz and Breazeal 2006). Additionally, varia¬ 
tions of Sarsa have been used to teach a mobile robot to 
deliver objects (Ramachandran and Gupta 2009), for navi¬ 
gation of a humanoid robot (Navarro, Weber, and Wermter 
2011), as well as in an interactive learning framework, where 
the user gives rewards to the robot through verbal com¬ 
mands (Tenorio-Gonzalez, Morales, and Villasenor Pineda 

2010 ) . 

Our experiment results show that cross-training improves 
quantitative measures of human-robot mental model conver¬ 
gence (p = 0.04) and mental model similarity (p = 0.03), 
as compared to Sarsa. Additionally, a post-experimental 
survey shows statistically significant differences in the per¬ 
ceived robot performance, as well as the trust to the robot 
(p < 0.01). Finally, we observed a significant improvement 
in team fluency metrics, such as an increase of 71% in 
concurrent motion (p = 0.02) and a decrease of 41% in 
human idle time (p = 0.04), during the actual human-robot 
task execution phase that succeeded the human-robot 
interactive planning process. 

Conclusions 

This paper presents results from ongoing work aimed at 
translating qualitative methods from human factors engi¬ 
neering and human team coordination into computational 
models that can be applied to human-machine team plan¬ 
ning. We describe results from two studies. The first applies 
statistical machine learning to quantitatively predict weak 
and strong agreement in human planning meetings. The ap¬ 
proach uses statistical analysis of dialog features, which 
prior studies in cognitive psychology have shown to qual¬ 
itatively capture the level of commitment to plan choices. 
Initial results indicate that we can achieve up to 94% average 
accuracy in predicting strong and weak agreement. The sec¬ 
ond study designs a computational teaming model and for¬ 
mulates a human-robot interactive planning method that em¬ 
ulates cross-training, a training strategy widely used in hu¬ 
man teams. Results from human subject experiments show 
statistically significant results on team fluency metrics, com¬ 
pared to standard reinforcement learning techniques. These 
results support the approach of modeling and applying com¬ 
mon practices in human teaming to achieve more effective 
and fluent human-robot teaming. 
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